PALMA: mRNA to genome alignments using large margin algorithms
نویسندگان
چکیده
MOTIVATION Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. RESULTS We present a novel approach based on large margin learning that combines accurate splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm-called PALMA-tunes the parameters of the model such that true alignments score higher than other alignments. We study the accuracy of alignments of mRNAs containing artificially generated micro-exons to genomic DNA. In a carefully designed experiment, we show that our algorithm accurately identifies the intron boundaries as well as boundaries of the optimal local alignment. It outperforms all other methods: for 5702 artificially shortened EST sequences from Caenorhabditis elegans and human, it correctly identifies the intron boundaries in all except two cases. The best other method is a recently proposed method called exalin which misaligns 37 of the sequences. Our method also demonstrates robustness to mutations, insertions and deletions, retaining accuracy even at high noise levels. AVAILABILITY Datasets for training, evaluation and testing, additional results and a stand-alone alignment tool implemented in C++ and python are available at http://www.fml.mpg.de/raetsch/projects/palma
منابع مشابه
PALMA: Perfect Alignments using Large Margin Algorithms
Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a conve...
متن کاملWhole Genome Alignments and Synteny Maps
IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of short DNA sequences. There are a large number of algorithms and methods developed for sequence alignment, in order to find conserved sequences across species. Smith & Waterman and Needleman & Wunsch algorithms are among the most popular...
متن کاملBLAT--the BLAST-like alignment tool.
Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments. A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. BLAT's speed stems from an index of all nonoverlapping K-mers in the genome. Th...
متن کاملLiquid-liquid equilibrium data prediction using large margin nearest neighbor
Guanidine hydrochloride has been widely used in the initial recovery steps of active protein from the inclusion bodies in aqueous two-phase system (ATPS). The knowledge of the guanidine hydrochloride effects on the liquid-liquid equilibrium (LLE) phase diagram behavior is still inadequate and no comprehensive theory exists for the prediction of the experimental trends. Therefore the effect the ...
متن کاملComputational methods for Multiple Genome Alignment and Synteny detection
Multiple genome alignments are useful to detect synteny, gene order, and large-scale genomic re-arrangements which help to understand genome evolution, divergence and the development of protein functions. However, aligning multiple whole genomes is very computationally intensive [3] and many genomes are only partially complete. Fast approximation algorithms have been developed to handle both th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 23 15 شماره
صفحات -
تاریخ انتشار 2007